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FREFACE 

In  the  design  of  phased-array  radars,  processing  equipment  and/or 
radar  power  can  be  saved  if  sequential  detection  (multi-stage  statisti¬ 
cal  test)  criteria  are  used.  This  Mmorandum  demonstrates  theoretically 
in  what  sense  Wald's  sequential  testing  is  optimal.  The  study  is  novel 
in  that  it  shows  chat  sequer^tial  testing  is  optimal  in  an  information 
theoretic  sense. 

The  work  was  undertaken  as  basic  research  in  technology  applicable 
to  the  design  of  electronically  scanned  radars  of  potential  use  in 
ballistic  missile  defenses.  It  Is  part  of  a  continuing  study  for  ARPA 
on  low-altitude  defen;  t  against  ballistic  missiles. 

Dr.  Julian  J.  Bu'isgang,  co-author  of  this  Memorandum,  is  President 
of  SIGNATRON,  Inc.,  Lexington,  Massachusetts,  and  is  a  Consultant  to 
The  RAND  Corporation. 
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SUIMAKY 


In  this  Memorandum  some  fundamental  aspects  of  multi-stage  tests 
of  alternate  statistical  hypotheses  are  discussed.  Section  II  is 
devoted  to  the  formulation  of  the  problem  and  the  definition  of  the 
quantities  of  interest.  Section  III  demonstrates  certain  fundamental 
equalities  of  the;  conditional  di.stributions  of  the  sample  size  which 
occur  in  Wald's  sequential  probabilit>  ratio  test.  These  equalities, 
which  to  the  authors'  knowledge  have  not  been  noted  before,  imply  that 
the  terminal  decision  is  a  sufficient  statistic  for  the  estimation 
of  the  true  hypothesis  regardless  of  the  terminal  stage.  In  Section  IV 
a  further  consequence  of  these  equalities  is  demonstrated.  Using 
information  theoretic  concepts,  the  rate  of  transmission  of  a  statistical 
test  is  defined  and  a  teot  procedure,  constructed  to  satisfy  these 
equalities,  is  shown  to  minimize  this  rate.  The  information  theoretic 
view  of  an  •=»lternate  decision  problem  has  been  suggested  before,  but 
only  for  a  fixed  sample  test. The  results  in  the  Memorandum  provide 
an  alternate  approach  to  the  study  of  the  optimality  of  multi-stage 
test.s  of  alternate  statistical  hypotheses  and  suggest  a  criterion  for 
designing  such  tests  based  on  the  conditional  distributions  of  the 
sample  size  rather  than  on  the  average  risk. 
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I.  DEFINITION  OF  AN  ALTERNATE  HYPOTHESIS  STATISTICAL  TEST 


As  a  general  framework  for  alternate  hypothesis  statistical 
tests  involving  a  discrete  sample  we  consider  that  there  exists  a 

CD 

real  valued  Borel  probability  measure  defined  on  where 

=  R,  R  being  the  real  line*..  This  measure  is  known  up  to  a 
parameter  6  that  can  have  one  of  two  values.*  These  values  for  6 
form  two  hypotheses  about  the  measure  which  are  characteristically 
denoted  by  (the  null  hypothesis)  and  (the  alternate  hypothesis). 
Similarly,  we  shall  denote  the  corresponding  measures  by  M-q  and 
We  assume  also  that  there  exist  a  priori  probabilities  tr  and  l-^f 
that  the  measures  are  and  respectively.  Each  possible 
measure  generates  a  stochastic  process,  or  ,  with  elements 
x€  called  paths,  which  are  sequences  of  real  ntmbers . 

In  an  actual  statistical  test  there  is  some  mechanism  for 
obtaining  numbers  called  observations.  We  assume  that  they  can  be 
obtained  one  at  a  time;  obtaining  the  observation  will  be  called 
the  stage  of  the  test.  When  the  observations  are  written  in 
order  Xj^,...,x^  (we  call  the  sequence  of  observations  a  sample)  they 
represent  the  first  n  values  of  a  particular  realization  or  path  of 
either  the  stochastic  process  or  the  stochastic  process 
multi-stage  alternate  hypothesis  test  is  a  decision  procedure  that  uses 


We  consider  that  the  measure  underlying  the  sample  is  one  of 
^‘'0(0  “  0,1)  in  order  to  cast  our  problem  as  one  of  parameter 
estimation.  However,  by  parameter  estimation  we  mean  more  than 
estimating  a  parameter  that  appears  in  a  distributfen  function  that 
might  generate  the  measure,  like  the  mean  of  a  Gaussian  distribution. 
We  view  the  parameter  9  as  an  index  for  the  two  possible  values  of 
the  measure . 
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the  sample  to  determine,  subject  to  certain  pre-established  probabilities 
of  error,  to  which  of  these  processes  the  path  belongs.  That  is,  it 
determines  whether  the  underlying  measure  is  or 

At  each  stage  of  the  alternate  hypothesis  test  one  of  three 
decisions  can  be  made:  that  the  hypothesis  Hq  is  true,  that  .le 
hypothesis  is  true  or  that  another  observation  should  be  made. 

Trying  to  achieve  maximum  generality  we  impose  on  these  tests  only 
two  conditions:  that  at  each  stage  these  three  decisions  are 
mutually  exclusive,  and  that  with  probability  one  the  test  eventually 
leads  to  the  acceptance  of  one  of  the  two  hypotheses. 

Tests  will  be  said  to  have  the  same  power  if  they  have  the 
same  error  probabilities.  The  error  probabilities  are  denoted  as 
follows: 


a  =  probability  that  is  accepted  when  is  true 

6  =  probability  that  is  accepted  when  is  true 


For  any  alternate  hypothesis  test  for  whic*^  a  decision  is 

made  at  each  stage,  the  collection  of  paths  that  lead  to  the 

acceptance  of  Hq(Hj)  at  the  n — •  stage  is  a  cyli.»der  set  in 

*  / 

This  cylinder  set  will  be  denoted  by  j. 


We  define 


tig  (n)j  =  measure  of  j 


e-0,1. 


The  conditions  imposed  upon  the  tests  insure  that  these  measures 
are  well  defined.  For  all  tests  of  the  same  power  (i.e.,  to  which 
there  corresponds  a  specific  pair  (o,  S))  ,  we  see  that  under 

the  two  conditions  imposed  above 


£ 
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00  JB  <x>  PC 

^  ^*o(n)»l-tr,  V  ^A*(n)-8,  J  Jiij*(n)-a  and  ^  ^i^(n)-l-^. 


n"l 


n"l 


n*l 


n"l 


Thus  ve  can  define  four  conditional  probability  density  functions 
PgCn)  and  Pq  (n) ,  0*0,1.  As  an  example  we  have 


"0 


- -  — _ — .  s  probability  that  Is  accepted 


^  Irk 

L  % 

n“l 


(n) 


th 


at  the  n —  stage  given  that 


is  the  true  hypothesis 

Lastly,  we  term  the  acceptance  of  decision  zero,  D^,  and  the 
acceptance  of  decision  one, 
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II.  IMPLICATIONS  OF  THE  EQUALITIES  p^Cn)  =  p^(n)  and  p^  (n)  =  p^  (n) 


We  consider  tests  of  the  same  power  characterized  by  an  (a,  0)  and 
are  concerned  with  the  completed  tests  and  the  decision  to  which  they 
lead.  For  a  completed  test  we  have  knowledge  about  two  random  variables, 
the  stage  at  which  the.  test  terminates  and  whether  it  terminates  in 
D_  or  D  .  'niis  is  the  case  regardless  of  what  functions  of  the  sample 

U  A 

are  used  to  arrive  at  the  decisions.  By  P(H^|U^,  n)  i,j*0,l  we  mean 
the  probability  that  is  the  true  hypothesis  given  that  the  test 


ended  at  the 


th 


stage  with  the  acceptance  of  hypothesis  j.  The 


probability  P(K  Id  ,  n)  is  th<^  a  posteriori  probability  of  the  true 

0  j 

hypothesis  H  in  a  multi-stage  alternate  hypothesis  test.  There  are 
6 

four  functions  of  this  kind;  an  example  is 
P(Hq|D^,  n) 


napQ  (n) 


** 


** 


nofpQ  (n)  +  (1-tt)  (1-0)  p^  (n) 


(1) 


So  far  we  have  attempted  to  portray  alternate  hypothesis 

tests  in  their  greatest  generality;  in  practice  it  is  common  to  use 

a  fixed  sample  or  Wald  sequential  probability  ratio  test.  In  the 
•*  ★★ 

first  case,  P-(N)  *  p  (N)  «  1,  0*0,1  where  N  is  the  pre-assigned 
0  0 

st..^,e  at  which  the  test  terminates.  In  the  latter  case  the  functions 
*  ** 

p  (n)  and  p  (n)  (i=0,l)  n  ■  1,2...,  are  generally  difficult  to 
0  0 

k  kk 

calculate.  Experience  indicates  that  unless  the  p  (n) ,  p  (n)  can 

0  0 

be  obtained  trivially,  as  in  a  fixed  sample  test,  their  calculation 
is  a  major  and  frequently  unsolvable  problem.  It  is  useful  to 
consider  when  the  conditional  probabilities  that  the  correct 
decision  was  made  would  be  independent  of  the  st.'^ge  at  which  the 
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tesfr  terminates.  From  the  form  of  expression  (1)  it  is  clear  that 
this  is  the  case  iJ"  and  only  if 


PQ(n)  =  Pj(n),  Pq  (n)  =  P^  (n)  (n  »  1,2,...,) 


(2) 


We  have  the  following  theorem: 


Theo rem  3 . 1  The  a  posteriori  probab^litj  of  satisfying  eithe' 

hypothesis  in  a  multi-stage  test  of  alternate  hypotheses  is  independent 
of  the  stage  at  which  the  test  ended  if  and  only  if  (2)  is  satisfied. 


Proof: 

Sufficiency  is  obvious  from  expression  (1);  for  necessity  we 
notice  that 


** 

Pi 


(n) 


** 

Po 


(n) 


s  const. 


Since  both  p^^  (n)  and  p^  (n)  are  probability  measures,  the  constant 
must  be  one. 


We  employ  Tlieorem  3.1  to  demonstrate  the  statistical  sufficiency 
of  the  terminal  decision  of  an  alternate  hypothesis  test,  when  the 
test  procedure  is  such  that  (2)  is  satisfied  and  the  test  is  used  to 
estimate  0.  The  outcome  of  a  specific  test  is  a  random  variable 
r  which  takes  on  the  values  »  n|-J“0»l;n«l,2...  .  Let  T 

be  a  function  of  f  such  that  T  j  ® j  •  Statistical 

sufficiency  can  be  defined  by  the  following  statement:  (Ref.  2) 


"If  the  conditional  distribution  of  0  given  X»x  depends  only  on 
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T(x)  then  T  is  a  sufficient  statistic  for  6."  Thus  in  the  problem 
of  observing  the  random  variable  f  and  estimating  0,  by  Theorem  3.1, 
the  statistic  nj-j  "  ^  sufficient  statistic  if  and  only 

if  (2)  is  satisfied. 

It  is  common  to  spe^h  of  a  statistic  as  being  sufficient  for 
the  estimation  of  a  parameter  of  a  stochastic  process  when  the 
statistic  is  a  function  of  the  observations  of  the  process.  Here 
the  statistic  is  a  function  of  both  the  observations  and  the  test 
procedure  which  is  chosen.  It  is  clear  that  when  an  alternate 
hypothesis  test  procedure  is  chosen,  and  the  outcome  of  this  test 
procedure  is  used  as  an  estimate  for  the  parameter,  that  considerable 
Information  about  the  parameter  contained  in  the  observations  might 
be  lost.  The  point  of  view  that  we  take  in  this  Memorandum  is  that 
we  are  studying  the  outcouie  of  a  multi-stage  alternate  hypothesis 
test,  and  not  the  composition  of  the  sample.  The  only  utilizable 
Information  that  these  tests  convey  is  the  decision  that  they  lead 
to.  Thus  it  is  important  to  know  when  the  terminal  decision  is  a 
sufficient  statistic  with  respect  to  the  true  hypothesis.  The  fact 
that  (2)  Implies  sufficiency  of  the  test  statistic  establishes  the 
significance  of  the  equalities  (2). 

We  now  show  that  (2)  is  satisfied  by  the  Wald  test.  (This  is 
also  true  for  a  fixed  sample  size  Neyman-Fearson  test  which  satisfies 
(2)  as  a  trivial  case.)  Since  the  Wald  test  employs  the  likelihood 
ratio,  it  is  necessary  to  Introduce  additional  assumptions  on 
and  to  Insure  that  this  ratio  exists.  The  likelihood  ratio  at 
stage  n  is  a  function  of  the  first  n  observations  of  a  particular 
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sample  path 


4a0^*1»  *2’****  V 

This  function  exists  as  a  Radon-Nikodym  derivative  as  long  as  is 
absolutely  continuous  with  respect  to 

The  Wald  sequential  probability  ratio  test  is  a  multi-stage 
test  of  altt  .'nate  hypotheses  that  continues  as  long  as 

. xj 

B  <  -T — p - r  <  A  (A>1,  B<1  are  const  ints)  (3) 

<4*o'*i****» 

and  ceases  with  the  acceptance  of  if  the  left  inequality  is 
violated  and  with  the  acceptance  of  if  the  right  inequality  is 
violated. 

There  is  a  fundamental  approximation  used  in  connection  with 
Wald  tests  that  is  frequently  referred  to  as  "neglecting  the  excess 
over  the  boundary".  This  approximation  consists  of  assuming  that 
when  the  sequential  test  terminates  there  is  equality  at  either  the 
left  side  or  right  side  of  (3).  The  approximation  becomes  exact 
when  the  sample  paths  are  continuous  with  independent  increments  and 
when  the  probability  density  function  for  the  value  of  each  increment 
is  continuous. 

It  is  well  known  that  with  this  approximation  B  is  taken  to 

be  -r^  and  A  is  taken  to  be 
l-Qf  O' 

In  the  terminology  of  this  Memorandum,  Wald's  approximation 
consists  of  saying  that  for  those  paths  which  lead  to  Dq(D^)  at  the 
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th 


stage 


dn^(x^,...,  x^) 
%(’'! . *„> 


. V  '  / 


('•) 


and  that  this  is  true  for  all  n.  We  will  assume  that  for  all  paths 

*  **  th 

in  C  (C  )  the  likelihood  ratio  at  the  n —  stage  is  constant,  but 
n  n 

that  the  constant  can  be  different  for  each  n.  Our  assumption  is 
meaningful  whenever  Wald's  assumption  is.  Of  course  since  we 
consider  more  general  measures  there  are  cases  where  the  assumption 
will  not  agree  with  reality.  The  following  theorem  shows  how 
important  the  assumption  is  and  displays  some  of  the  special 
properties  of  the  Wald  test. 


Theorem  3.2  Assume  that 

. x^) 

^(x^,...,  x^) 

*  ** 

is  constant  for  all  paths  in  C  and  C  ,  and  consider  alternate 

n  n  ’ 

hypothesis  tests  in  which  the  function  of  the  observations  at  the 
nr^  stage  that  is  used  to  perform  the  estimation  io 

. V 

Then  (2)  is  satisfied  if  and  only  if  the  test  is  a  Wald  test. 


Proof: 

Suppose  we  have  a  Wald  test  and  (x,,...,  x  _ _ )  GC  ,  then 

in  n 

■  ■  ■  . —  - — - - - - -  --  -  - - - - 
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x^)  ^  ^ 

dn^Cx^,.. . ,  x^)  “  a 


Integrating  over 


C  we  have 
n 


p  <4»j(x)  1-6  r  4i.(x) 


n 


**  Irk  It 

so  p,  (n)  =  p  (n).  A  similar  result  holds  for  paths  in  C  . 
i  u  n 

•kk ,  ^  kk  kk  1-fl  *★ 

Now  let  p-  (n)  -  p  (n).  Then  |i  (n)  ■  — ^  H  (n) . 

i  U  ^  Of  0 

We  write  this  as 


P  (x) 


1:1 

Of 


4io(x) 


(5) 


n 


where  by  x  we  mean  the  cylinder  set  represented  by  (Xj^,,.., 
Since  we  assume  that 


d^i^(x) 


is  a  constant  for  x 


it  frllows  that 


dki^(x) 


i;6 

or 


for  all  tests  that  lead  to  at  the  n —  stage.  A  similar  result 
Sr  Sr 

holds  if  p^(n)  «  p^Cn) .  Thus  the  test  is  a  Wald  test  and  the 


theorem  is  proven 
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It  is  obvious  that  for  any  alternate  hypothesis  test  of  fixed 
saiiq>le  sire  (2)  is  satisfied.  This  shows  the  importance  of  the 
assumption  of  the  constancy  of  the  likelihood  ratio  for  proving  the 
converse  of  Theorem  3.2. 


III.  INFORMATION  THEORETIC  APPROACH 


By  regarding  a  statistical  test  of  alternate  hypotheses  as  a 
pr  'blem  of  transmitting  messages  over  a  noisy  channel  and  by  defin¬ 
ing  the  information  rate  per  decision  we  are  able  to  provide  addi¬ 
tional  insight  into  the  nature  of  these  tests.  In  particular,  we 
are  able  to  interpret  the  optimality  of  the  Wald  test  from  the  in¬ 
formation  theoretic  point  of  view.  We  first  restate  the  basic 

(3) 

formalism  of  information  theory. 


Let  X,  Y  and  Z  be  three  discrete  random  variables  which  occur 


together  and  let 


i«l, . . . , 


1  be  tht^  set  of  the  I  possible 


m  be  the  set  of  the  m  possible 
. . . ,  n  be  the  set  of  the  n 


possible  different  values  of  Z.  Denote  the  probability  of  the  Joint 
occurence  of  for  X,  y^  for  Y  and  for  Z  by  »  2'**i[J* 
The  joint  entropy  of  X,  Y  and  Z  is  then  defined  by 


l^,n 

H(X,Y,Z)  “  -  p[x-x^,  Y-yj,  Z-zy  log  p[x-x^,  Y-y^ ,  Z-Zj^]  (6) 

i.j.k-1 


The  logarithmic  base  in  this  expression  and  in  those  that  follow 
is  the  same  but  is  otherwise  arbttr."Ty;  the  choice  of  the  base 
corresponds  to  the  choice  of  a  unit  for  measuring  entropy  and  is 
usually  base  2.  A  change  in  the  logarithnic  base  introduces  only 
a  mutliplicative  scale  factor  which  is  u£  no  consequence  in  this 
work.  The  joint  entropy  of  X  and  Y  and  the  entropy  of  X  alone  are 
defined  by 


iiiiinn . . ^^M^T!i|limiMril“!llHlBltllffllllWillii'lilfii  '  ■"  - 


I  ,m 

H(X,Y)  *  -  ^  P  [x-x^,  Y-yJlogpj^X-x.  ,  Y-yjl  (7) 

ij-l 

and 

I 

H(X).  -  ^  P  ^X-x^Jlogpfx^x^J  (8) 

i-1 

in  which  prx*x^,  Y*y  "|  is  the  Joint  probability  that  X«=x  and 
Y*yj  and  p|^X=x^J  is  the  probability  that  X=x^. 

Suppose  that  an  information  source  can,  by  some  random  mech¬ 
anism,  generate  one  of  two  messages  which  are  indexed  0  and  1.  The 
index  of  the  message  actually  generated  at  a  particular  time  is 
taken  as  the  random  variable  X.  The  entropy  H(X)  is  said  to  measure 
the  amount  of  information  contained  in  a  message  generated  by  that 
source.  If  the  in'^ormation  source  feeds  a  noisy  channel,  the  leceiver 
at  the  output  of  the  channel  receives  the  message  corrupted  by  noise. 
The  receiver  decodes  the  message,  i.e.,  estimates  whether  X  was  0 
or  1.  The  estimate  of  X  made  by  the  receiver  is  the  random  variable 
Y  which  has  the  same  two  possible  values  as  X.  The  entropy  HCY)  is 
said  to  measuie  the  amount  of  information  generated  by  the  receiver. 
The  rate  of  transmission  R(X;Y)  is  defined  as  the  sum  of  the  amount 
of  inforniation  generated  by  the  source  and  the  eunount  of  information 
generated  by  the  receiver  minus  the  amount  of  information  coiTinon  to 
both  the  transmitter  and  the  receiver 

R(X;Y)  »  H(X)  f  H(Y)  -  H(X,Y) 


(9) 
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If  the  channel  is  so  noisy  that  the  variables  X  ano  Y  are  independent, 
H(X,Y)=  H(X)+H(Y)  and  the  rate  is  zero.  If  the  channel  is  not  noisy 
at  all,  X  is  the  same  as  Y,  H(X,Y)-  H(X)-  H(Y)  and  R(X;Y)  is  equal 
to  H(X) .  In  effect  R(X;Y)  measures  that  part  of  information  generated 
by  the  source  that  must  reach  the  receiver  in  order  that  the  receiver 
generate  an  amount  of  information  H(Y) .  The  quantity  H(X)-R(X;Y)  is 
called  the  "equivocation"  of  X  given  Y  and  measures  the  amount  of  unwanted 
information  reaching  the  receiver  that  is  generated  by  channel  noise. 

Next,  these  basic  concepts  of  the  information  theory  are  applied 
to  statistical  tests  of  alternate  hypotheses.  We  have  two  distinct 
hypotheses  Hq  and  which  occur  with  a  priori  probabilities  tt  and 
1-n.  The  random  variable  X  is  the  index  of  the  true  hypothesis  so 
that  Prx=oJ^  and  p|^X»1^1-tt.  The  statistical  test  can  terminate 
with  the  acceptance  o£  the  hypothesis  Hq  which  is  the  decision  Dq, 
or  with  the  acceptance  of  the  hypothesis  which  is  the  decision  Dj^. 

The  random  variable  Y  is  the  index  of  the  accepted  decision.  If  a 
and  P  are  the  specified  probabilities  of  errors,  wc  have 
O'  =  p|^Y=l  1  X=oJ  and  P  =  pj^Y“0  j  X“lJ.  The  particular  stage  N  at 
which  the  test  can  terminate  is  also  a  random  variable  depending 
on  the  particular  sample  and  on  the  test  procedure.  The  relevant 
conditional  probabilities  that  the  test  will  terminate  at  a  particular 
stage  n  are  denoted  by 

p*(n)  -  p[N-nl  X-0,  Y-oJ 

p*(n)  -  P^N-nl  X-1,  Y-oJ  (10) 

p7(n)“  X-0,  Y-lJ 

p^(n)-  P[^N-n|  X-1,  Y»lj 


-14- 


where  n  is  a  positive  integer. 

The  rate  R(XiY,N)  of  information  per  decision,  i.e.,  the  amount  of 
information  which  must  reach  the  receiver  in  order  that  the  estimate 
Y  of  X  can  be  achieved  with  error  probabilities  or  and  &  at  the 
stage  is,  by  rn  extension  of  (9) 


R(X;Y,N)  -  H(X)  +  H(Y,N)  -  H(X,Y,N)  (11) 

The  equivocation  of  X  given  Y  and  N,  which  measures  the  amount  of 
Information  required  to  estimate  Y  at  the  stage,  is 


H(X)  -  R(X;Y,N)  -  H(X,Y,N)  -  H(Y,N) 


Substituting  in  (11)  from  (6),  (7),  (8)  and  (10),  we  find  that 
R(X;Y,N)  can  be  written  in  the  form; 

R(Xi1f,N)  -Q(")  -  nj  -  (i.r)Q(j22]+c[n,l-o,B.p*(„),p*(„)‘ 

+  G[n,a,l-B.p^(n).p^(n)] 

in  which  Q(t)  ■  -t  log  t-(l-ty  ^og  (1-t),  n  -  (l-o)n  +  8(l-n)  and  G(-) 
is  a  function  expressing  the  dependence  of  R(X;Y,N)  on  the  terminal 
stage , 

G|TT,l-a,e,pQ(n),p^(n)J  * 


"2. 

n«l 


+  PQ(n)  log  p*(n)  +  P*(n)  log  P*(n)| 


(U) 


(n)l 


Bie  sum  of  the  last  two  terms  in  (12)  is  in  effect  R(N,X|Y). 
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Each  term  of  the  summation  in  (13)  is  of  the  form  (8|^t^4-g2t2) 
+g^t(t^)+g2«(t2)  whe.e  g^>0,g2>0,g^+g2-l  and  *(t)-tlogt  is  a 
continuous  convex  function.  We  assume  that  a<h  ^nd  so  that 
g^ti+g^t^  lies  between  t^  and  t2.  It  follows  that  each  such  term 

ic  It 

is  strictly  positive  and  is  zero  if  and  only  if  PqCh)  >  P|^(n)  and 
Pq  (n)  -  p^  (n)  for  all  n.  When  a  test  procedure  is  used  such  that 
both  of  these  conditions  are  satisfied,  the  minimum  value  of 
R(X;Y,M)  over  all  possible  tests  of  power  ((y,  3)  achieved  and 
we  have 

Min  R(X;Y,N)  -  Q(tt)-0  n)  -  (14) 

-  H(X)-H(X|Y). 

An  alternate  form  (14),  obtained  by  rearranging  the  different  terms, 
is 


Min  R(X;Y,N)  -  Q(n)-Tr  Q(l-a)-(W)Q(e) 

-  H(Y)-H(Y|X). 

These  results  are  expressed  in  the  following  theorem: 

Theorem  4.1  Among  all  the  procedures  for  conducting  a  statistical 
test  of  alternate  hypotheses,  the  procedure  which  is  designed  to 
satisfy  the  conditions  pQ(n)  -  p^(n)  and  p^  (n)  -  p^  (n)  for  all  n 
rv^quires  the  minimum  rate  of  information  to  attain  the  desired 
probabilities  of  error  cr  snd  3  for  any  a  priori  probability  rr  and 
1-TT.  This  minimum  rate  is  given  by  (14). 


-16- 


Co rollary  4,1  For  a  sample  consisting  of  independent  and  identically 
distributed  variables,  the  Wald  test  requires  the  least  rate  of  infor¬ 
mation  to  attain  the  desired  probabilities  of  error  qi  and  3  for  any 
given  a  priori  probabilities  n  and  1-:t. 

This  Corollary  follows  from  the  fact  that  by  Theorem  3.1  the 
Wald  test  satisfies  the  conditions  of  Theorem  4.1. 

An  interesting  qualitative  argument  can  be  based  on  Theorem  4.1. 

It  is  plausible  to  suppose  that  the  amount  of  information  in  a  sample 
is  a  monotonically  increasing  function  of  the  average  sample  si :e. 

This  assumption  together  with  Theorem  4.1  implies  that  the  test  pro¬ 
cedure  designed  to  satisfy  the  conditions  of  Theorem  4.1  requires,  on 
the  average,  the  smallest  average  sample  size  to  provide  a  statistical 
test  with  the  power  ((y,  3). 

The  result  (14)  also  implies  this  additional  Theorem: 

Theorem  4.2  When  the  test  procedure  is  designed  to  satisfy  the 
★  *  ** 

conditions  PQ(f^)  =  and  p^  (n)  =  Pj^  (n)  for  all  n,  the  rate  of 

transmission  R(X;y,N)  does  not  depend  on  the  terminal  stage  N. 

Proof: 

We  observe  by  writing  out  R(X;Y)  in  terms  a  and  3  that 
Min  R(X;Y,N)  -  R(X;Y). 

This  theorem  is  a  complementary  result  of  the  notion  of 
sufficiency  discussed  in  the  previous  section.  Another  result  which 
is  less  obvious  can  be  stated  in  the  form  of  the  following  Theorem: 

Theorem  4.3  Consider  two  different  test  procedures  which  have 
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probabilities  of  error  less  than  0.5.  If  these  test  procedures 
require  the  same  rate  of  information  per  decision  [but  only  one 
procedure  is  designed  to  satisfy  (2)]  the  procedure  that  satisfies 
(2)  cannot  have  probabilities  f  error  (o,  P)  both  larger  than  the 
corresponding  probabilities  of  error  of  the  other  test.  This  holds 
for  any  a  priori  probabilities  tt  and  l-rr. 

Proof: 

Let  a*  and  0'  be  the  probabilities  of  error  of  the  first  and 
second  kind  of  the  test  that  satisfies  (2)  and  and  0  the  corres¬ 
ponding  probability  of  the  other  test.  The  infoinnation  rate  per 
decision  of  the  tect  that  satisfies  (2)  is  by  (14) 

R'  -  Q(n)  - 

in  which 


o'  -  (l-a')TT  +  0'(1-tt) 


The  information  rate  of  the  other  test  is  given  by  (12),  Since 
both  rates  are  assumed  to  be  equal  and  the  G(*)  functions  are 
positive,  we  must  have 


Q(tt)  -  o  Q 


<Q(tt)-0 


(i-n')Q 


(15) 


Suppose  we  assume  that 


0,5  >  a'  2  or 
0,5  >  0'  2  3 


(16) 
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This  assumption  implies  the  following  inequalities  (see  Fig.  1) 


for  any  0  <  n  <  1.  Let  P^,  P',  P,  P^,  P2  be  the  points  on  the  curve 
y  -  Q(t)  corresponding  to  (l-cr)n/o,  (l-oOn/o'  n,  a^n/Cl-nO, 

(yn/l-Q.  Since  Q(t)  is  continuous,  concave  and  non-linear,  the 
chord  Pj  P'  lies  above  the  chord  F2^F2  ®^cept  when  both  equality 
signs  in  (16)  hold  and  the  chords  coincide.  Suppose  R  and  R' 
are  the  points  on  the  chords  ^2  to  t-n. 

Then  P  lies  above  R'  and  R'  lies  above  R  except  when  and 

which  case  R  and  R^  coincide.  Let  PR  be  the  distance 
from  P  to  R.  The  inequality  (16)  therefore  implies  PR  ^  PR'. 

But  the  left-hand  side  of  (15)  is  the  distance  PR  and  the  right- 
hand  side  of  (15)  is  the  distance  PR'.  Thus  (15)  represents  the 
inequality  PR  <  PR'  which  therefore  cannot  be  aciiieved  under 
condition  (16).  Conversely  the  inequality  (16)  contradicts  (15). 
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IV.  CONCLUSIONS 

It  is  important  to  notice  the  difference  between  the  usual 
conmunication  problem  and  the  decision  problem.  In  the  communication 
problem  the  channel  is  specified,  and  one  desires  to  maximize  the 
rate  of  transmission.  This  is  achieved  through  the  coding  of  messages 
which  is  said  to  match  the  information  source  to  the  channel.  The 
maximum  rate  of  transmission  (with  respect  to  all  the  admissible 
sources)  that  can  oe  achieved  for  a  particular  channel  is  known  as 
the  capacity  of  the  channel.  By  contrast,  in  the  decision  problem 
the  experimenter  assumes  a  priori  the  hypotheses  and  out 
not  the  test  procedure;  thus  the  information  source  rather  than  the 
channel  is  specified.  The  test  procedure  (i.e.,  the  tt>t  statistic 
and  the  decision  regions)  that  plays  the  part  of  the  chanm^l  can  be 
chosen  by  the  experimenter.  The  probabilities  of  error  determine 
the  amount  of  information  which  must  i>e  generated  by  the  receiver. 

The  relevant  design  problem  is  now  to  select  that  test'  procedure 
that  requires  least  information  to  complete  the  test,  i.e.,  that 
minimizes  the  rate  of  transmission.  We  might  consider  this  as  the 
problem  of  tuotching  the  channel  to  the  source. 

We  find  that  the  Wald  test  not  only  minimizes  the  average 
risk  but  also  minimizes  the  rate  of  transmission  independently  of 
the  a  priori  probabilities.  The  proof  of  the  optimality  of  the 
Wald  test  in  the  sense  of  minimum  average  risk  applies  only  to  the 
alternate  hypotheses  tests  on  Identically  distributed,  independent 
sumples.^^^  It  is  suggestive  to  apply  the  Theorem  4.1  to  the 
design  of  multi-stage  statistical  tests  of  alternate  hypotheses 
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even  in  the  case  of  correlated  and  non- identically  distributed 
observations  by  requiring  that  the  test  procedure  be  constructed 
to  satisfy  (2).  This  rule  of  construction  would  determine  the 
boundaries  of  the  proper  decision  regions  which  need  not  be  parallel 
lines.  Another  extension  of  Theorem  4.1  applies  to  the  design  of 
multi-stage  statistical  tests  of  multiple  hypotheses  where  by 
analogy  to  the  case  of  two  hypotheses,  the  optimum  decision  rule 
would  be  specified  by  the  relevant  equalities  among  conditional 
probabilities  at  each  stage. 
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